🥋 Karate 🥰 ❄️ Snowflake

🇫🇷 Ne ratez pas votre SQL avec un test Karate

🇬🇧 Don’t miss your SQL code with a Karate test

🪄 Once upon a time…​ ✨

A great data pipeline !

data pipeline simple

$ whoami 👨‍💻

A great pipeline…​ but sometimes…​ 😥

Not so great…​ more complex

data pipeline dirty data pipeline complex

💡We need Tests 🤗 & QA 🕵️ 🐛

But also some tools…​

Let’s talk about architecture 👷

ETL vs ELT…​

snowflake text

  • Data Cloud…​ DBMS++

  • Snowpark, Stream/Task…​

  • STORAGE 💰 dissociated from COMPUTE 💰💰💰

  • Scalable COMPUTE

kapoeira text

Kapoeira - Example (1/2)

burger factory

Kapoeira - Example (2/2)

Feature: Burger 🍔 feature

  Background:
    Given input topic
      | topic     | alias        | key_type | value_type |
      | bread     | bread-in     | string   | string     |
      | vegetable | vegetable-in | string   | string     |
      | meat      | meat-in      | string   | string     |
    And output topic
      | topic  | alias      | key_type | value_type | readTimeoutInSecond |
      | burger | burger-out | string   | string     | 5                   |
    And var uuid = call function: uuid

  Scenario Outline: Burger Factory
    When records with key and value are sent
      | topic_alias  | key        | value       |
      | bread-in     | 🧑‍🍳_${uuid} | <bread>     |
      | vegetable-in | 🧑‍🍳_${uuid} | <vegetable> |
      | meat-in      | 🧑‍🍳_${uuid} | <meat>      |
      
    Then expected records
      | topic_alias | key        | value  |
      | burger-out  | 🧑‍🍳_${uuid} | result |

    And assert result $ == "<output>"

    Examples:
      | bread | vegetable | meat | output       |
      | 🍞    | 🍅        | 🥩   | 🍔           |
      | 🍞    | 🍅        | 🍗   | 🍔           |
      | 🍞    | 🍅        | 🐟   | 🍔           |
      | 🍞    | 🥕        | 🥩   | 🍞 + 🥕 + 🥩 |

dbt text

karatelabs

Karate - Example

Feature: karate 'hello world' example
  
  Scenario: create and retrieve a cat
    Given url 'http://myhost.com/v1/cats'
    And request { name: 'Billie' }
    When method post
    Then status 201
    And match response == { id: '#notnull', name: 'Billie' }
    
    Given path response.id
    When method get
    Then status 200

Karate - HTTP
Snowflake - JDBC SQL
🤔

Karate/Snowflake - Example

Feature: SELECT V1
  Background:
    * url "https://<SNOWFLAKE_ACCOUNT>.snowflakecomputing.com/api/v2"
    * string jwtToken = karate.exec("snow connection generate-jwt --silent --account <SNOWFLAKE_ACCOUNT> --user <SNOWFLAKE_USER> --private-key-file <SNOWFLAKE_USER_PEM>")
    * header Content-Type = "application/json"
    * header Accept = "application/json"
    * header Authorization = "Bearer " + jwtToken
    * header User-Agent = "<MY_APP_IT>"
    * header X-Snowflake-Authorization-Token-Type = "KEYPAIR_JWT"

  Scenario: Select 1 cutter
    Given path "statements"
    And text payload =
        """
        {
          "statement": "SELECT SERIAL_NUMBER, CUTTER_TYPE FROM CUTTER WHERE SERIAL_NUMBER='MY_VECTOR'",
          "timeout": 60,
          "role": "<SNOWFLAKE_ROLE>",
          "warehouse": "<SNOWFLAKE_WAREHOUSE>",
          "database": "<SNOWFLAKE_DATABASE>",
          "schema": "<SNOWFLAKE_SCHEMA>"
        }
        """
    When request payload
    And method post
    Then status 200
    And match response.resultSetMetaData.numRows == 1
    And match response.resultSetMetaData.rowType[0].name == "SERIAL_NUMBER"
    And match response.resultSetMetaData.rowType[1].name == "CUTTER_TYPE"
    And match response.data[0][0] == "MY_VECTOR"
    And match response.data[0][1] == "VECTOR"

Complex for a SELECT…​😠
Run Task & DBT ?🤔

data pipeline elt
  • Karate DSL extension - Examples

  • karate.exec(<any_command>)

  • 😃

💡Lectra karate-data
💪😎👍

  • Fat JAR (karate.jar + extensions) or Docker Image (with all you need)

  • Retry / timeout configuration

  • Clone / Drop schema for each Scenario

  • Other advanced features

  • 😃

Karate/Snowflake - Example

Feature: SELECT V2
  Background:
    * json cliConfig = read('classpath:cli-config.json')
    * json snowflakeConfig = read('classpath:snowflake-config.json')
    * string jwtToken = snowflake.cli.generateJwtToken(cliConfig)
    * json restConfig = {...cliConfig, ...snowflakeConfig, jwtToken: jwtToken}

  Scenario: Select 1 cutter
    Given text statement =
    """
      SELECT SERIAL_NUMBER, CUTTER_TYPE
      FROM CUTTER
      WHERE SERIAL_NUMBER='MY_VECTOR'
    """
    And def response = snowflake.rest.runSql({...restConfig, statement: statement})
    And table expectedData
      | SERIAL_NUMBER | CUTTER_TYPE |
      | "MY_VECTOR"   | "VECTOR"    |
    And match response.data == expectedData

+ karate-config.js for configuration

snowflake DSL

cli
snowflake.cli.generateJwtToken({ account: "...", user: "...", privateKeyPath: "...", privateKeyPassphrase: "..." })
snowflake.cli.runSql({ statement: "...", cliConfig: { ... }, snowflakeConfig: { ... } })
...
rest
snowflake.rest.runSql({ statement: "...", cliConfig: {... }, snowflakeConfig: { ... }, jwtToken: "... Optional" })
...

kubernetes DSL

TODO

rabbitmq DSL

TODO

Demo Time 🎬

Kafka version
burger factory

Snowflake version
burger factory

DDL

ddl

burger-factory.feature

Feature: Demo
  Background:
    * json cliConfig = snowflake.cliConfigFromEnv
    * string jwtToken = snowflake.cli.generateJwtToken(cliConfig)
    * json restConfig = ({...cliConfig, jwtToken: jwtToken}) 
    * string clientId = "😋_"+lectra.uuid()
    * def genStatement = (table, value) => "INSERT INTO "+table+"(CLIENT_ID, VALUE) VALUES ('"+clientId+"','"+value+"')"

  Scenario Outline: Burger Factory - <bread> + <vegetable> + <meat> = <output>
    Given table inserts
      | table       | value         | config                     |
      | "BREAD"     | "<bread>"     | snowflakeConfigs.BREAD     |
      | "VEGETABLE" | "<vegetable>" | snowflakeConfigs.VEGETABLE |
      | "MEAT"      | "<meat>"      | snowflakeConfigs.MEAT      |
    And json responses = karate.map(inserts, (row) => snowflake.rest.runSql({...restConfig, snowflakeConfig: row.config, statement: genStatement(row.table, row.value)}).status)
    And match each responses == "OK"

    When string dbtConsoleOutput = karate.exec("dbt run")
    And match dbtConsoleOutput contains "Completed successfully"

    Then string selectStatement = "SELECT VALUE FROM BURGER WHERE CLIENT_ID='"+clientId+"'"
    And json response = snowflake.rest.runSql({...restConfig, snowflakeConfig: snowflakeConfigs.BURGER, statement: selectStatement })
    And match response.data == [ { "VALUE" : "<output>" } ]

    Examples:
      | bread | vegetable | meat | output       |
      | 🍞    | 🍅        | 🥩   | 🍔           |
      | 🍞    | 🍅        | 🍗   | 🍔           |
      | 🍞    | 🍅        | 🐟   | 🍔           |
      | 🍞    | 🥕        | 🥩   | 🍞 + 🥕 + 🥩 |

Clone schemas…​

Feature: Demo - Clone Schemas
  Background:
    * json cliConfig = snowflake.cliConfigFromEnv
    * string jwtToken = snowflake.cli.generateJwtToken(cliConfig)
    * json restConfig = ({...cliConfig, snowflakeConfig: snowflakeConfigs.BREAD, jwtToken: jwtToken})
    * string clientId = "😋_"+lectra.uuid()
    * def genStatement = (table, value) => "INSERT INTO "+table+"(CLIENT_ID, VALUE) VALUES ('"+clientId+"','"+value+"')"
    * json cloneResult = cloneSnowflakeConfigs(restConfig)
    * configure afterScenario = function(){ dropSnowflakeConfigs(restConfig, cloneResult.snowflakeConfigs) }

  Scenario Outline: Burger Factory - <bread> + <vegetable> + <meat> = <output>
    Given table inserts
      | table       | value         | config                                 |
      | "BREAD"     | "<bread>"     | cloneResult.snowflakeConfigs.BREAD     |
      | "VEGETABLE" | "<vegetable>" | cloneResult.snowflakeConfigs.VEGETABLE |
      | "MEAT"      | "<meat>"      | cloneResult.snowflakeConfigs.MEAT      |
    And json responses = karate.map(inserts, (row) => snowflake.rest.runSql({...restConfig, snowflakeConfig: row.config, statement: genStatement(row.table, row.value)}).status)
    And match each responses == "OK"

    * string cmd = cloneResult.dbtPrefix+" dbt run"
    When string dbtConsoleOutput = karate.exec("bash -c '"+cmd+"'")
    And match dbtConsoleOutput contains "Completed successfully"

    Then string selectStatement = "SELECT VALUE FROM BURGER WHERE CLIENT_ID='"+clientId+"'"
    And json response = snowflake.rest.runSql({...cliConfig, snowflakeConfig: cloneResult.snowflakeConfigs.BURGER, statement: selectStatement })
    And match response.data == [ { "VALUE" : "<output>" } ]

    Examples:
      | bread | vegetable | meat | output       |
      | 🍞    | 🍅        | 🥩   | 🍔           |
      | 🍞    | 🍅        | 🍗   | 🍔           |
      | 🍞    | 🍅        | 🐟   | 🍔           |
      | 🍞    | 🥕        | 🥩   | 🍞 + 🥕 + 🥩 |

Next Steps 🚀

  • Open Source

🎉 Thank you 🙏

Questions ?

Feedback TODO